An Analysis of Random Design Linear Regression

نویسندگان

  • Daniel J. Hsu
  • Sham M. Kakade
  • Tong Zhang
چکیده

The random design setting for linear regression concerns estimators based on a random sample of covariate/response pairs. This work gives explicit bounds on the prediction error for the ordinary least squares estimator and the ridge regression estimator under mild assumptions on the covariate/response distributions. In particular, this work provides sharp results on the “out-of-sample” prediction error, as opposed to the “in-sample” (fixed design) error. Our analysis also explicitly reveals the effect of noise vs. modeling errors. The approach reveals a close connection to the more traditional fixed design setting, and our methods make use of recent advances in concentration inequalities (for vectors and matrices). We also describe an application of our results to fast least squares computations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetic Analysis of Milk Yield in Iranian Holstein Cattle by the Test Day Model

Using monthly test day records the genetic parameters of Iranian Holstein cattle in first lactation were studied. Data of 277400 test-day milk records from 65320 cows and 2210 sires were analyzed by an animal random regression model using restricted maximum likelihood methodology. The model included herd-test-date, interaction between year-season of calving, days in milk (linear and quadratic) ...

متن کامل

Some Modifications to Calculate Regression Coefficients in Multiple Linear Regression

In a multiple linear regression model, there are instances where one has to update the regression parameters. In such models as new data become available, by adding one row to the design matrix, the least-squares estimates for the parameters must be updated to reflect the impact of the new data. We will modify two existing methods of calculating regression coefficients in multiple linear regres...

متن کامل

Finite Sample Properties of Quantile Interrupted Time Series Analysis: A Simulation Study

Interrupted Time Series (ITS) analysis represents a powerful quasi-experime-ntal design in which a discontinuity is enforced at a specific intervention point in a time series, and separate regression functions are fitted before and after the intervention point. Segmented linear/quantile regression can be used in ITS designs to isolate intervention effects by estimating the sudden/level change (...

متن کامل

Comprehensive causal analysis of occupational accidents’ severity in the chemical industries; A field study based on feature selection and multiple linear regression techniques

Introduction: The causal analysis of occupational accidents’ severity in the chemical industries may improve safety design programs in these industries. This comprehensive study was implemented to analyze the factors affecting occupational accidents’ severity in the chemical industries. Methods and Materials: An analytical study was conducted in 22 chemical industries during 2016-2017. The stu...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1106.2363  شماره 

صفحات  -

تاریخ انتشار 2011